<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
6.824 Spring 2021 Paper Questions
</title>

<!-- To add a new question, just put in within a <div> tag, give it
some identifier (i.e., 'ID="q-XX"'), and then add it to the questions
array variable below.  To link directly to the question, just use a
link to 'questions.html?q=q-XX'.
 -->

<script src="common.js"></script>

<style>
div.questionbox {
    margin: 1pc 4% 0pc 4%;
    padding: 0.5pc 0.5pc 0.5pc 0.5pc;
    background-color: #e0e0ff;
    border: 1px dashed red;
}
</style>

</head>

<body bgcolor="#ffffff" text="#00000" onLoad="page_onload()">

<div align="center">
<h2>
<a href="index.html">6.824</a> Spring 2021 Paper Questions
</h2>
</div>

<a name="top"></a>
<p>
  For each paper, your assignment is two-fold.  Before the start of
  lecture discussing the paper:

<ul>
<li>Submit your answer for each lecture's paper question via the
    <a href="../2021/handin.py/handin.py.html">submission
    web site</a>, and
<li>Submit your own question about the paper (e.g., what you find most
    confusing about the paper or the paper's general context/problem).
    You cannot use the question below.  To the extent possible, during
    lecture we will try to answer questions submitted the evening before.
</ul>

</p>

<p>
You can also upload your questions and answers using curl:
<pre>
## Answer goes into lecN.txt
$ curl -F file=@lec2.txt \
       -F key=XXXXXXXX \
       https://6824.scripts.mit.edu/2021/handin.py/upload
## Question goes into sqN.txt
$ curl -F file=@sq2.txt \
       -F key=XXXXXXXX \
       https://6824.scripts.mit.edu/2021/handin.py/upload
</pre>
<p>

<div id="questions">

<div class="questionbox" ID="q-gointro">
  <p>
   The assigned reading for today is not a paper, but
   the <a href="http://tour.golang.org/">Online Go tutorial</a>.  The assigned
   "question" is the
   <a href="https://tour.golang.org/concurrency/10">Crawler exercise</a> in the
   tutorial.  Also, take a look at
   Go's <a href="https://golang.org/pkg/net/rpc/">RPC package</a>, which you
   will use in lab 1.
</div>

<div class="questionbox" ID="q-gfs">
  <p>
    Describe a sequence of events that would result in a client reading stale
    data from the <a href="papers/gfs.pdf">Google File System</a>.
  </p>
</div>

<div class="questionbox" ID="q-vm-ft">
  <p>
    How does <a href="papers/vm-ft.pdf">VM FT</a> handle network partitions?  That
    is, is it possible that if the primary and the backup end up in different
    network partitions that the backup will become a primary too and the system
    will run with two primaries?
  </p>
</div>

<div class="questionbox" ID="q-concurrency">
  <p>
    Consider the following code from the "incorrect synchronization" examples:
  </p>
  <pre>
var a string
var done bool

func setup() {
	a = "hello, world"
	done = true
}

func main() {
	go setup()
	for !done {
	}
	print(a)
}
  </pre>
  <p>Using the synchronization mechanisms of your choice, fix this code so it
  is guaranteed to have the intended behavior according to the Go language
  specification. Explain why your modification works in terms of the
  happens-before relation.</p>
</div>

<div class="questionbox" ID="q-raft">
<p>
Suppose we have the scenario shown in the Raft paper's Figure 7: a cluster of seven
servers, with the log contents shown. The first server crashes (the one
at the top of the figure), and
cannot be contacted. A leader election ensues.
For each of the servers marked (a), (d), and (f), could that server
be elected?
If yes, which servers would vote for it? If no, what specific
Raft mechanism(s) would prevent it from being elected?
</div>

<div class="questionbox" ID="q-raft2">
<p>
Could a received InstallSnapshot RPC cause the state machine to go
backwards in time? That is, could step 8 in Figure 13 cause the state
machine to be reset so that it reflects fewer executed operations? If
yes, explain how this could happen. If no, explain why it can't happen.
</div>

<div class="questionbox" ID="q-spinnaker">
  <p>
    Please read the paper's Appendices. In Spinnaker a leader to responds to a client request after the leader
    and <i>one</i> follower have written a log record for the request on
    persistent storage. Why is this sufficient to guarantee strong
    consistency even after the leader or the one follower fail?
  </p>
  <p>
   (This paper relies on Zookeeper, which we will read later.)
  </p>
</div>

<div class="questionbox" ID="q-QAlab">
  <p>
    The lecture today is an Q&A session about the last submitted lab
    (e.g., lab 1 or lab2A+B). Submit a question about the lab: for
    example, something you wondered about while doing the lab,
    something you didn't understand, or just anything.
</div>

<div class="questionbox" ID="q-zookeeper">
  <p>
    One use of <a href="papers/zookeeper.pdf">Zookeeper</a> is as a fault-tolerant
    lock service (see the section "Simple locks" on page 6).  Why isn't possible
    for two clients to acquire the same lock?  In particular, how does
    Zookeeper decide if a client has failed and it can give the client's locks to 
    other clients?
  </p>
</div>

<div class="questionbox" ID="q-craq">
  <p>
    Item 4 in Section 2.3 says that, if a client read request arrives
    and the latest version is dirty, the node should ask the tail for
    the latest committed version. Suppose, instead, that the node
    replied with its most recent clean version (ignoring any dirty
    version and not sending a version query to the tail). This change
    would cause reads to reflect the most recent committed write that
    the node is aware of. Explain how this could lead to violations of
    linearizability -- or violations of the paper's goal of strong
    consistency.
  </p>
</div>

<div class="questionbox" ID="q-cr">
  <p>
    Give an example scenario where a client could observe incorrect
    results (i.e., non-linearizable) if the head of the chain would
    return a response to the client as soon as it received an
    acknowledgment from the next server in the chain (instead of
    the tail responding).
  </p>
</div>

<div class="questionbox" ID="q-aurora">
  <p>
The second paragraph of Section 4.1 says "The runtime state maintained
by the database lets us use single segment reads rather than quorum
reads..." What runtime state does the database need to maintain in
order to avoid having to read from a quorum?
  </p>
</div>

<div class="questionbox" ID="q-go">
<p>
Russ Cox is one of the leads on the Go project. What do you like best about Go?
Why?  Would you want to change anything in the language? If so, what and why?
</div>

<div class="questionbox" ID="q-chapter9">
<p>
<a href="../resources/res-6-004-principles-of-computer-system-design-an-introduction-spring-2009/online-textbook/online-textbook.html">6.033 Book</a>.
Read just these parts of Chapter 9: 9.1.5, 9.1.6, 9.5.2, 9.5.3, 9.6.3. The
last two sections (on two-phase locking and distributed two-phase
commit) are the most important.
The Question: describe a situation where Two-Phase Locking yields higher performance than Simple Locking.
</div>

<div class="questionbox" ID="q-spanner">
<p><a href="papers/spanner.pdf">Spanner</a>
Suppose a Spanner server's TT.now() returns correct
information, but the uncertainty is large. For example,
suppose the absolute time is 10:15:30, and TT.now()
returns the interval [10:15:20,10:15:40]. That interval
is correct in that it contains the absolute time, but the
error bound is 10 seconds.
See Section 3 for an explanation TT.now().
What bad effect will a large error bound have on Spanner's operation?
Give a specific example.
</div>

<div class="questionbox" ID="q-farm">
<p>
<a href="papers/farm-2015.pdf">No compromises: distributed
transactions with consistency, availability, and performance</a>:
Suppose there are two FaRM transactions that both increment the same
object. They start at the same time and see the same initial value for
the object. One transaction completely finishes committing (see
Section 4 and Figure 4). Then the second transaction starts to commit.
There are no failures. What is the evidence that FaRM will use to
realize that it must abort the second transaction? At what point in
the Section 4 / Figure 4 protocol will FaRM realize that it must
abort?
</div>

<div class="questionbox" ID="q-mdcc">
<p><a href="papers/mdcc.pdf">MDCC</a>
Come back later for this year's question.
</div>


<div class="questionbox" ID="q-spark">
<p>
<a href="papers/zaharia-spark.pdf">Resilient Distributed Datasets: A
  Fault-Tolerant Abstraction for In-Memory Cluster Computing</a>
What applications can Spark support well that MapReduce/Hadoop cannot support?
</div>

<div class="questionbox" ID="q-naiad">
<p><a href="papers/naiad.pdf">Naiad: A Timely Dataflow System</a>:
Consider the data-flow graph in Figure 3, and assume that the vertices are
instantiated as follows:<br />
<ul>
  <li><b>A</b> is a distinct-by-key operator, which for inputs of <tt>(key,
      value)</tt> only emits the <em>first</em> <tt>(key, value)</tt> record
      with each distinct key.
  <li><b>B</b> is a multiply operator, which multiplies the value by two
      (i.e., for <tt>(key, value)</tt>, it emits <tt>(key, value * 2)</tt>.
  <li><b>C</b> is a conditional operator, which sends its output to
      <b>E</b> if the input <tt>value</tt> is greater than 10, and sends it
      to <b>F</b> otherwise.
  <li><b>D</b> is an aggregation operator that sums all values irrespective
      of key.
</ul>
<p>Now assume that we introduce two records in epoch <em>e</em> = 1 at the
<b>In</b> vertex: <tt>(a, 2)</tt> and <tt>(b, 6)</tt>; and one record in
<em>e</em> = 2: <tt>(a, 5)</tt>.</p>
<p>Write down the timestamp changes that the records experience as they flow
through the graph, e.g., <tt>(a, 2): at A, t = (1, []); at B, t = ...</tt>.
You may omit vertices that do not modify the timestamp.</p>
<p>Informally explain when <tt>E.OnNotify((1, []))</tt> will be called.
You do not need to step through the details of the pointstamp protocol.</tt>
</div>

<div class="questionbox" ID="q-parameter">
<p><a href="papers/parameter.pdf">Parameter Server</a>:
The parameter server model was developed for running machine-learning
algorithms like sparse logistic regression (&sect; 5.1).</p>
<p>What do you expect the bottleneck &ndash; that is, the most loaded &ndash;
resource (e.g., CPU, memory, network) to be at the <b>workers</b> and at
the <b>servers</b>?</p>
<p>What resource utilization would you likely observe if using Spark's
logistic regression (see &sect;3.2.1 in the Spark paper) instead
of the parameter server for the experiment in &sect;5.1?
</div>

<div class="questionbox" ID="q-frangipani">
<p>
<a href="papers/thekkath-frangipani.pdf">Frangipani: A Scalable
Distributed File System</a>: Suppose a server modifies an i-node,
appends the modification to its log, then another server modifies the
same i-node, and then the first server crashes. The recovery system
will see the i-node modification in the crashed server's log, but
should not apply that log entry to the i-node, because that would
un-do the second server's change. How does Frangipani avoid or cope
with this situation?
</div>

<div class="questionbox" ID="q-memcached">
<p>
<a href="papers/memcache-fb.pdf">Memcache at Facebook</a>.
Section 3.3 implies that a client that writes data does not
delete the corresponding key from the Gutter servers, even
though the client does try to delete the key from the ordinary
Memcached servers (Figure 1). Explain why it would be a bad
idea for writing clients to delete keys from Gutter servers.
</div>

<div class="questionbox" ID="q-cops">
<p>
<a href="papers/cops.pdf">COPS</a>. The last sentence in Section 4.3
says a client clears its context after a put, replacing the context
with just the put. The text observes "This put depends on all previous
key-version pairs and thus is nearer than them." Why does clearing the
context and replacing it with just the put make sense? You might think
that the client's subsequent puts would need to carry along the
dependency information about previous gets. What entity ultimately
uses the context information, and why does it not need the information
about gets before the last put?
</div>


<div class="questionbox" ID="q-ct">
<p>
Explain why it's important that all viewers of a Certificate
Transparency log see the same content. What attacks would be possible
if a malicious log server was able to cause some readers of its log to
see one sequence of records, and other readers to see a different set
of records? What mechanisms ensure that all viewers see the same log
content?
</div>

<div class="questionbox" ID="q-blockstack">
<p>
Why is it important that Blockstack names be unique, human-readable,
and decentralized? Why is providing all three properties hard?
</div>


<div class="questionbox" ID="q-bayou">
<p>
<a href="papers/bayou-conflicts.ps">Managing Update Conflicts in Bayou</a>
  Suppose we build a distributed filesystem using Bayou, and the
  system has a copy operation. Initially, file A contains "foo"
  and file B contains "bar". On one node, a user copies file A
  to file B, overwriting the old contents of B. On another node, a
  user copies file B to file A. After both operations are committed, we
  want both files to contain "foo" or for both files to contain
  "bar". Sketch a dependency check and merge procedure for the
  copy operation that makes this work. How does Bayou ensure that
  all the nodes agree about whether A and B contain "foo" or
  "bar"?
</div>

<div class="questionbox" ID="q-chord">
<p>
For the design described in <a href="papers/stoica-chord.pdf">Chord: a scalable
  peer-to-peer lookup service for Internet applications</a>, if two Chord nodes
  x and y are located nearby each other in the Internet (e.g., they are both in
  the same datacenter), is the number of Chord hops small (say 1 or 2 instead of
  log N) when x is looking up a key that is stored on node y?
</p>
</div>

<div class="questionbox" ID="q-dynamo">
<p><a href="papers/dynamo.pdf">Dynamo</a>
Suppose Dynamo server S1 is perfectly healthy with a working network
connection. By mistake, an administrator instructs server S2 to remove
S1 using the mechanisms described in 4.8.1 and 4.9. It takes a while
for the membership change to propagate from S2 to the rest of the
system (including S1), so for a while some clients and servers will
think that S1 is still part of the system. Will Dynamo operate
correctly in this situation? Why, or why not?
</div>

<div class="questionbox" ID="q-sundr">
<p>
<a href="papers/li-sundr.pdf">Secure Untrusted Data Repository (SUNDR)</a>
In the simple straw-man, both fetch and modify operations are placed
in the log and signed. Suppose an alternate design that only signs and
logs modify operations. Does this allow a malicious server to break
fetch-modify consistency or fork consistency? Why or why not?
</div>

<div class="questionbox" ID="q-bitcoin">
<p>
<a href="papers/bitcoin.pdf">Bitcoin</a>
Try to buy something with Bitcoin. It may help to cooperate
with some 6.824 class-mates, and it may help to start a
few days early.
If you decide to give up, that's OK. Briefly describe your experience.
</div>

<div class="questionbox" ID="q-analogic">
<p>
<a href="papers/katabi-analogicfs.pdf">Experiences with a Distributed,
 Scalable, Methodological File System: AnalogicFS</a>. In many ways,
 this experiences paper raises more questions than it answers. Please
 answer one of the following questions, taking into consideration the
 rich history of AnalogicFS and the spirit in which the paper was
 written:

<p>
a) The analysis of A* search shown in Figure 1 claims to be an
introspective visualization of the AnalogicFS methodology; however,
not all decisions are depicted in the figure. In particular, if I
<= P, what should be the next node explored such that all assumptions
in Section 2 still hold? Show your work.

<p>
b) Despite the authors' claims in the introduction that AnalogicFS was
developed to study SCSI disks (and their interaction with lambda
calculus), the experimental setup detailed in Section 4.1 involves
decommissioned Gameboys instead, which use cartridge-based, Flash-like
memory. If the authors had used actual SCSI disks during the
experiments, how exactly might have their results changed
quantitatively?

<p>
c) AnalogicFS shows rather unstable multicast algorithm popularity
(Figure 5), especially compared with some of the previous systems
we've read about in 6.824. Give an example of another system that
would have a more steady measurement of popularity pages, especially
in the range of 0.1-0.4 decibels of bandwidth.

<p>
d) For his 6.824 project, Ben Bitdiddle chose to build a
variant of Lab 4
that faithfully emulates the constant expected seek time across
LISP machines, as AnalogicFS does. Upon implementation, however, he
immediately ran into the need to cap the value size to 400 nm, rather
than 676 nm. Explain what assumptions made for the AnalogicFS
implementation do not hold true for Lab 4, and why that changes the
maximum value size.

</div>


<!-- OLD, ARCHIVED QUESTIONS FOLLOW -->

<!--<div class="questionbox" ID="q-ficus">
<p><a href="papers/ficus.pdf">Ficus</a>

Imagine a situation like the paper's Figure 1, but in which only Site
A updates file Foo. What should Ficus do in that case when the
partition is merged? Explain how Ficus could tell the difference
between the situation in which both Site A and Site B update Foo,
and the situation in which only Site A updates Foo.
</div>

<div class="questionbox" ID="q-ivy">
<p>
<a href="papers/li-dsm.pdf">Memory Coherence in Shared Virtual
Systems</a>. Answer this question using
<a href="notes/ivy-code.txt">ivy-code.txt</a>, which is a version of
the code in Section 3.1 with some clarifications and bug fixes. You
can see that the manager part of the WriteServer sends out invalidate
messages, and waits for confirmation messages indicating that the
invalidates have been received and processed. Suppose the manager sent
out invalidates, but did not wait for confirmations. Describe a
scenario in which lack of the confirmation would cause the system to
behave incorrectly. You should assume that the network delivers all
messages, and that none of the computers fail.
</div>

<div class="questionbox" ID="q-treadmarks">
<p>
<a href="papers/keleher-treadmarks.pdf">Distributed Shared Memory on
  Standard Workstations and Operating Systems</a>
Suppose that a simplified version of Treadmarks, called Dreadmarks,
simply sent all modifications of variables between an acquire and a
release to the next processor to acquire the same lock. No other
modifications are sent. What changes does Treadmarks send that
Dreadmarks does not?  Outline a specific simple situation in which
Treadmarks would provide more useful or intuitive memory behavior than
Dreadmarks.
</div>

<div class="questionbox" ID="q-munin">
<p>
<a href="papers/munin.pdf">Implementation and Performance of Munin</a> The SOR
application (see Section 4.2) with the producer-consumer pattern performs
substantial better than with the write-shared pattern (see Table 6 on page
161). Why is the producer-consumer pattern a better fit for SOR than write-shared?
</div>

<div class="questionbox" ID="q-harp">
<p>
<a href="papers/bliskov-harp.pdf">Replication in the Harp File System</a>
Figures 5-1, 5-2, and 5-3 show that Harp often finishes benchmarks
faster than a conventional non-replicated NFS server. This may be
surprising, since you might expect Harp to do strictly more work than a
conventional NFS server (for example, Harp must manage the replication).
Why is Harp often faster? Will all NFS operations be faster with Harp
than on a conventional NFS server, or just some of them? Which?
</div>

<div class="questionbox" ID="q-fds">
<p>
<a href="papers/fds.pdf">Flat Datacenter Storage</a> Suppose
tractserver T1 is temporarily unreachable due to a network problem, so
the metadata server drops T1 from the TLT.
Then the network problem goes away, but for a while the metadata
server is not aware that T1's status has changed. During this time
could T1 serve client
requests to read and write tracts that it stores? If yes, give an
example of how this could happen. If no, explain what mechanism(s)
prevent this from happening.
</div>

<div class="questionbox" ID="q-paxos">
<p>
<a href="papers/paxos-simple.pdf">Paxos Made Simple</a>
Suppose that the acceptors are <i>A</i>, <i>B</i>,
and <i>C</i>. <i>A</i> and <i>B</i> are also
proposers.
How does Paxos
ensure that the following sequence of events <u>can't</u> happen?
What actually happens, and which value is ultimately chosen?

<ol>
<li> <i>A</i> sends prepare requests with proposal number 1, and gets responses
  from <i>A</i>, <i>B</i>, and <i>C</i>.
<li> <i>A</i> sends <tt><nobr>accept(1, "foo")</nobr></tt>
  to <i>A</i> and <i>C</i> and gets responses from both.
  Because a majority accepted, <i>A</i> thinks
  that <tt>"foo"</tt> has been chosen.
  However, <i>A</i> crashes before sending an <tt>accept</tt> to <i>B</i>.
<li><i>B</i> sends prepare messages with proposal number 2, and gets responses
  from <i>B</i> and <i>C</i>.
<li><i>B</i> sends <tt><nobr>accept(2, "bar")</nobr></tt> messages
  to <i>B</i> and <i>C</i> and gets responses
  from both, so <i>B</i> thinks that <tt>"bar"</tt> has been chosen.
</ol>
</div>

<div class="questionbox" ID="q-hypervisor">
<p>
<a href="papers/bressoud-hypervisor.pdf">Hypervisor-based
  Fault-tolerance</a>
Suppose that instead of connecting both the primary and backup to the
same disk, we connected them to separate disks with identical copies
of the data? Would this work? What else might we have to worry about,
and what are some things that could go wrong?
</div>

<div class="questionbox" ID="q-pnuts">
<p>
<a href="papers/cooper-pnuts.pdf">PNUTS: Yahoo!'s Hosted Data Serving Platform</a>
Briefly explain why it is (or isn't) okay to use relaxed consistency
for social applications (see Section 4). Does PNUTS handle the type of
problem presented by Example 1 in Section 1, and if so, how?
</div>

<div class="questionbox" ID="q-wormhole">
<p><a href="papers/wormhole.pdf">Wormhole</a> stores the datamarker for
a multi-copy reliable delivery (MCRD) flow in Zookeeper, instead of in persistent
storage on the publisher side (as Single-Copy Reliable Delivery does).  Why does
MCRD store the datamarker in Zookeeper?
</div>

<div class="questionbox" ID="q-fb-consistency">
<p><a href="papers/fb-consistency.pdf">Existential Consistency</a>.
Section 2.2 mentions that
<a href="https://en.wikipedia.org/wiki/Causal_consistency">causal consistency</a>
is a non-local
consistency model, and thus can't be checked by the analysis
in Section 3. Give a pair of examples, one legal under causal
consistency and one illegal, that the Section 3 checker would
not be able to distinguish. That is, show by example that the
Section 3 approach can't check causal consistency.
</div>

<div class="questionbox" ID="q-borg">
  <p>
Borg's key benefit for Google is that it efficiently utilizes cluster
resources. As part of this, it is important to avoid resources being
wasted by sitting idle. List and describe at least <i>three</i> techniques
that Borg uses to ensure good machine and cluster utilization. What
potential negative consequences, if any, does each technique have for
(a) non-production/low-priority, and (b) production/high-priority workloads?
</p>
</div>

<div class="questionbox" ID="q-kademlia">
<p>
<a href="papers/kademlia.pdf">Kademlia: A Peer-to-peer Information System
Based on the XOR Metric</a>
Consider a Kademlia-based key-value store with a million users, with non-mutable keys: once a
key is published, it will not be modified. The k/v store experiences a network partition into
two roughly equal partitions A and B for 1.5 hours.

<p>X is a very popular key. Would nodes in both A and B likely be able to
access X's value (1) during the partition? (2) 10 minutes after the network is joined?
(3) 25 hours after the network is joined?

<p>(optional) Would your answer change if X was an un-popular key?
</div>

<div class="questionbox" ID="xqargus">
<p>
<a href="papers/liskov-argus.pdf">Guardians and Actions: Linguistic Support for Robust, Distributed Programs</a>
In Figure 4, <tt>read_mail</tt> deletes email from the mailbox.  If
new mail arrives just before the delete, will the email be deleted but
not returned?  Please briefly explain your answer.
</div>

<div class="questionbox" ID="q-argus88">
<p>
<a href="papers/argus88.pdf">Distributed Programming in Argus</a>.
Starting at the bottom-left of page 310, the paper mentions that
a participant writes new versions to disk <b>twice</b>:
once before replying to a prepare message, and once after receiving
a commit message.
Why are both writes necessary?
What could go wrong if participants replied to the prepare
without writing the disk, instead only writing the disk
after receiving a commit message?
</div>

<div class="questionbox" ID="q-rstar">
<p>
<a href="papers/rstar.pdf">Transaction Management in the R* Distributed Database Management System</a>.
The first two paragraphs of Section 2.1 say that
a subordinate force-writes its disk <b>twice</b>:
once before replying to a PREPARE message, and once after receiving
a COMMIT message.
Why are both writes necessary?
What could go wrong if a subordinate replied to the PREPARE
without writing its disk, and only wrote its disk
after receiving a COMMIT message?
</div>

<div class="questionbox" ID="q-thor95">
<p>
<a href="papers/thor95.pdf">Efficient Optimisitic Concurrency Control
Using Loosely Synchronized Clocks</a>. The paper mentions
that, after a
server commits a transaction, the server sends out invalidation
messages to clients that are caching data written by that transaction. It
may take a while for those invalidations to arrive; during that time,
transactions at other clients may read stale cached data. How does Thor cope
with this situation?
</div>

<div class="questionbox" ID="q-plan9">
<p>
<a href="papers/plan9.pdf">Plan 9 from Bell Labs</a>
List three features introduced by Plan 9 that have not been adopted
by today's common operating systems.  For each, why do you think
the idea hasn't become popular?
</div>

<div class="questionbox" ID="q-mapreduce">
<p>
<a href="papers/mapreduce.pdf">MapReduce</a>
How soon after it receives the first file of intermediate data can a
reduce worker start calling the application's Reduce function?
Explain your answer.
</div>

<div class="questionbox" ID="q-remus">
<p>
The Remus paper's Figure 6 suggests that less frequent checkpoints can lead
to better performance. Of course, checkpointing only every X
milliseconds means that up to X milliseconds of work are lost if the
primary crashes. Suppose it was OK to lose an entire second of work if
the primary crashed. Explain why checkpointing every second would lead
to terrible performance if the application running on Remus were a Web
server.
</div>

<div class="questionbox" ID="q-epaxos">
<p>
When will an EPaxos replica <b>R</b> <em>execute</em> a particular
command <b>C</b>?  Think about when commands are committed, command
interference, read operations, etc.
</div>

<div class="questionbox" ID="q-bft">
<p>
Suppose that we eliminated the pre-prepare phase from the Practical
BFT protocol. Instead, the primary multicasts a PREPARE,v,n,m message
to the replicas, and the replicas multicast COMMIT messages and reply
to the client as before. What could go wrong with this protocol? Give
a short example, e.g., ``The primary sends foo, replicas 1 and 2 reply
with bar...''
</div>


<div class="questionbox" ID="q-txchains">
<p>
Give an example scenario where an application would behave incorrectly
if it waited for just the first hop in a chain to commit, but would
behave correctly by waiting for the entire chain to commit.
</div>

<div class="questionbox" ID="q-realworld">
<p>
Building distributed systems in the real world have both technical
challenges (e.g., dealing with bad data) and non-technical ones (e.g.,
how to structure a team). Which do you think is harder to get right? What
examples can you cite from your personal experience?
</div>

</div>-->

<hr>
<p>Questions or comments regarding 6.824? Send e-mail to <a href=
"mailto:6824-staff@lists.csail.mit.edu"><i>6824-staff@lists.csail.mit.edu</i></a>.</p>

<p><b><a href="questions_12.html#top">Top</a></b> //
<b><a href= "index.html">6.824 home</a></b> //</p>
</body>
</html>
